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A meta-analytical approach regarding school effectiveness: the size of school 
effects and the effect size of educational leadership 



ABSTRACT 

In the field of school effectiveness research there is a growing uncertainty about questions 
like which factors are responsible for differentiating between effective and non-effective 
schools, what is the real contribution of the significant factors, what are the true sizes of 
school effects and the extent of generalizability of school effectiveness results. One of the 
possibilities to address these questions is performing a meta-analysis. This paper deals 
with background of a meta-analytical approach as conducted by the University of Twente. 
Furthermore some preliminary results are presented. These results are related to the size 
of reported school effects and to the effect size of the variable educational leadership. With 
regard to first results will be presented for different sectors and subjects with an estimation 
of the boundaries where between the "true" effects may be. Prior to the presentation of 
these results some persistent problems in the measurement of school effects will be dealt 
with. More specifically attention will be paid to problems like measurement error, 
specification in relevant levels and the choice of covariates. With regard to educational 
leadership, attention will be paid to the question whether educational leadership does have 
a significant relationship with student achievement and what the effect size of this variable 
might be. 

INTRODUCTION 

The history of school effectiveness research has by now a long tradition. A popular view is 
to look at school effectiveness research as a reaction to the quite pessimistic views on 
teachers, schools and education in general brought forward by disappointing results of 
research, in particular those results of the work of influential researchers like Coleman et. 
al. (1966) and Jencks (1972). In this respect the work of Edmonds (1979) and Brookover, 
Beady, Flood, and Schweitzer (1979) in the United States and of Rutter, Maughan, 
Mortimore and Ouston (1979) in the United Kingdom are often seen as important starting 
points for school effectiveness research. 

In particular in the United States a great deal of work has been done by researchers 
building on the work of Edmonds and Brookover and associates. Around the mid-1980 
these studies in turn led to reviews of school effectiveness research, in which frequently 
(five or more) factors were cited as being responsible for differences between effective and 
non-effective schools (Purkey & Smith, 1983; Wilson & Corcoran, 1983). In a sense, these 
reviews and the factors mentioned in these reviews formed one of the basis tenets of the 
school effectiveness community of the 1980’s. 

However, things have drastically change in the last decennium, mainly due to the 
increased internalization of the school effectiveness community since the 1980’s. Where 
eight or ten years ago generally isolated communities of researchers in different cultures 
(especially in the United Kingdom, the United States, Australia, the Netherlands, Canada, 
Scandinavia) were working on the subject of school effectiveness, nowadays there is an 
international network making use of each other’s concepts and building on results 
stemming from different cultures than their own. 

The effect of the internationalization of the field has not merely been the affirmation of the 
validity of the existing knowledge base of the discipline. In the United States the results of 
research are the most consistent with the ’original’ knowledge base. Most clearly this can 
be deducted from the research review conducted by Levine and Lezotte (1990; 1992). 
Drawing on a large body of studies in the field of school effectiveness and school 
improvement they are able to note the consistent tendency for certain school effectiveness 
’correlates’ or factors to appear in virtually all studies reviewed as being linked with school 





effectiveness. These factors are shown in table 1. 



Table 1 Characteristics of unusually effective schools (Levine & Lezotte, 1990) 



Productive school climate and culture 
Orderly environment 
Staff commitment 
Problem solving orientation 

Staff cohesion, collaboration, consensus, communications 
Staff input into decision making 

Schoolwide emphasis on recognizing positive performance 

Focus on student acquisition of central learning skills 

Maximum availability and use of time for learning 
Emphasis on mastery of central learning skills 

Appropriate monitoring of student progress 



Practice orientated staff development 
Outstanding leadership 

Vigorous selection and replacement of teachers 
Maverick orientation and buffering 
frequent monitoring of school activities 

High expenditure of time and energy for school iivprovement actions 
Support for teachers 
Acquisition of resources 
Superior instructional leadership 

Availability and effective utilization of instructional support personnel 



Salient parental involvement 

Effective instructional arrangements 

Successful grouping and related organizational arrangements 
Appropriate pacing and alignment 
Active/enriched learning 
Effective teaching practices 

Emphasis on higher order learning in assessing instructional outcomes 

Coordination in curriculum and instruction 

Easy availability of abundant, appropriate instructional 

materials 

Classroom adaption 

Stealing time for reading, language and maths 

High operationalized expectations and requirements for students 

Other possible correlates 

Student sense of efficacy 
Multi-cultural instruction and sensitivity 
Personal development of students 

Rigorous and equitable student promotions policies and practices 
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However, a different picture emerges from research on school effectiveness conducted in 
other countries. In the Netherlands since the 1980’s there is a growing body of studies 
dealing with factors associated with school effectiveness. The results of these studies are 
summarized in table 2. These results show clearly that there is a consistent inability of 
Dutch researcher to establish in their schools the importance of the school effectiveness 
correlates mentioned by Levine and Lezotte. 



Table 2 School effectiveness factors supported by Dutch studies (n=42) 



no. of studies no. of studies 
addressing showing positive 
factor significant relations 



factor 

structured lessons/feedback 


17 


6 


instructional leadership 


17 


3 


orderly climate 


15 


6 


student evaluation 


18 


6 


whole class/differentiation 


10 


3 


achievement orientation 


24 


9 


team stability/teacher cooperation 


15 


1 


time/homework 


17 


8 



More or less the same holds true for school effectiveness research in the United Kingdom. 
Discussing the British research tradition in the field of school effectiveness Reynolds 
(1992) states that many of the early certainties in the British research paradigm have 
eroded as the field has developed. More specifically he points to the fact that in the United 
Kingdom the field of school effectiveness research has been affected by uncertainties 
related to the size of school effects and their consistency over time, the interrelationship of 
outcome variables and the precise factors responsible for differentially effective school 
processes. 

The Dutch-PSO programme 

The above delineated development of increasing uncertainty about the validity of the 
knowledge base of school effectiveness research has led to a growing awareness of the 
relevancy of questions like what factors are really responsible for differences in 
effectiveness between schools, what is the size of school effects and the generalizabilty of 
results of school effectiveness research. One can point for instance to the ISERP-project 
(Reynolds, Creemers, Bird & Parrel, 1994). This project aims to build on existing models of 
'good practice’ in terms of research design and to avoid the variation in national studies’ 
research designs that limits transferability within and between countries. It does so by 
utilisation of standard measures of inputs, processes and outcomes, common methods of 
data analysis and common methods of data collection. As such, this project tries to answer 
the question which factors are generalizable across countries and what is the influence of 
the context on these factors. Another approach that tries to address questions about which 
factors are relevant with respect to school effectiveness, the size of school effects and the 
generalizability of school factors is the Dutch PSO-programme. In this paper the 
background of this programme and some results will be presented. 

The Dutch PSO-programme, undertaken by the University of Twente and the University of 
Groningen, builds on the notion that there is a growing uncertainty with respect to the 



school effectiveness knowledge base. The underlying assumption is that some "hard" 
questions should be asked with respect to the existing school effectiveness models. These 
models are not only seen as general and vague as to the internal relationships of factors 
responsible for difference in effectiveness between schools, but also uncertain as far as 
the significance of the factors that are supposed to cause achievement are concerned. 
One of the contentions of the programme is trying to put the next step forward in school 
effectiveness research by a number of activities. One important activity is the appliance of 
a quantative meta-analysis on existing rchool effectiveness research and thus, apart from 
making the available knowledge base in our field more accessible, sharpening our 
knowledge on which factors are and which factors are not essential in explaining 
educational achievement. Furthermore, a second aim of conducting this analysis is to bear 
upon the relevant question on the reality of generalizable school effectiveness models 
versus their differential or context-specific nature. Furthermore, this programme aims at a 
theoretical reconstruction of school effectiveness models by means of analytic work using 
relevant theories focuses on evaluation practices within schools and at an exploration of 
alternative causal specifications of conceptual school effectiveness models using available 
empirical data basis. 

In next sections of this paper some preliminary results of the meta-analytic approach will 
be presented. First will be dealt with the question about the true size of school effects. This 
section will start with an overview of the problems related to isolating the true effects of 
schools. Next to this the results of a quantative meta-analysis on the size of school effects 
will be presented. After the question of the true size of school effects, attention will be paid 
to the question whether educational leadership does have a positive and significant 
relationship with student achievement and what the effect size of this variable might be. 

THE TRUE’ SIZE OF SCHOOL EFFECTS 

introduction 

As a first stage in the project mentioned, we will consider the fundamental topic of the size 
of school effects, there are some problems worth mentioning in considering the size of 
school effects. In school effectiveness research one may wish to differentiate between four 
types of school effects: 

1. the effect of a school on its pupils is their gross mean achievements score, expressed 
as the deviation form the grand mean (the mean school effect), so the predicted score for 
school j is the grand mean. Discussions about standards in education involve such a 
notion of school effects. 

2. the effect of a school on its pupils is the mean progress these pupils make in a given 
time period. The predicted score in this case is based on the pupils initial achievement. 
This kind of operationalization is often referred to as ’learning gain’. 

3. the effect of a school on its pupils is the mean overachievemnt of its pupils. The 
predicted score in this case is based on pupil background characteristics such as socio- 
economic status, mental abilities and the like, that are known to have a substantial effect 
on their achievement. This kind of operationalization is most widely used in school 
effectiveness research. 

4. the effect of a school on its pupils is the mean net progress these pupils make in as far 
as this progress cannot be accounted for by relevant pupil background characteristics like 
socio-economic status and so on. the predicted score in this case is based on the pupils’ 
background characteristics and their initial achievement. This combination is in effect used 
in the ’Junior schools’- project (Mortimore et al., 1988). 

Qualitative reviews ignore the fact that studies in the field of school effectiveness vary with 
respect to the operational definition of "school effect". The only reconciliation is, that 
Bosker (1990) empirically demonstrated that the four operationalizations correlate high (.78 



and more). 



As an example of a quantitative approach to synthesizing the results of school 
effectiveness research will present a meta-analysis on studies that assessed gross school 
effects and/or school effects based on the idea of overachievement. We confine ourselves 
to the UK and the Netherlands, to primary and secondary schools, to mathematics and 
language as subject domains, and to those studies that used multi-level models or random 
effects ANOVA. In the appendix the studies are mentioned that are selected for the meta- 
analysis. The questions that we seek to answer is: 

1 . what is the size of the gross and overachievement based school effects? 

2. does the size of the school effect vary across subject domains (mathematics and 
language), sectors (primary and secondary), and/or country (UK and the 
Netherlands)? 

For the meat-analysis we apply the multilevel model as suggested by Raudenbush (1994). 
We consider the selected studies as a sample from the population of studies on school 
effects. Nested under each study are the secondary units: the schools. What we will 
consider as the size of the school effect is the estimated between school variance 
proportional to the total variance in achievement (within and between schools). 

The multi-level model then, starting with the within-study model, is: 

(1) s izSj = s/ze, + Sj 

The effect size estimate in study j (size) is an estimate of the population effect size (size) 
and the associated sampling error is Oj (since in each study only a sample of schools is 
studied). 

The between-studies model is: 

(2) size^ = intercept + U| 

In words: the true unknown-effect size as estimated in study j (size) is a function of the 
effect size across studies (intercept) with random sampling error (since the studies are 
sampled from a population of studies). 

In assessing effects of subject domain, country, and sector model 2 is extended to: 

(3) sizBj = intercept + YiSubject| + Y 2 sectorj + YaCountry, + 

Only a few of the studies reviewed mentioned standard errors for the estimated variance 
components (the size of which depends a.o. on the sample size used in the study), and 
when they did, it was not in all cases quite clear whether these standard errors had to do 
with the variance or the square root of it. For this reason we roughly calculated the 
standards errors from (cf Longford, 1994, 58): 



(4) var(x^) = 2o'‘/N * [1/(n-1) + 2(0 + nco^j 

where is the between school variance, cf is the within school variance, N is the total 
sample size, n is the (average) number of pupils per school in the sample, and co is the the 
variance ratio x^/o^ 

This approach to calculating the standard errors of the variance components is rather 
crude, since we have to assume balanced designs, and no predictors (which of course is 
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not so in the meta-analysis of the net between school variance). 

results 

The results of the meta-analysis on the gross school effects are presented in table 3. 
Table 3: Results for the meta-analysis on gross school effects 





effect 


s.e. 


p-value 


mean gross school effect 


.1386 


.0100 


.000 


variance across studies 


.0039 




.000 


intercept gross school effect 


.1594 


.0134 


.000 


sector-effect (secondary) 


-.0831 


.0183 


.000 


subject-effect (mathematics) 


.0261 


.0164 


.113 


country-effect (uk) 


.0323 


.0224 


.140 


variance across studies (res) 


.0020 




.000 



In the first two lines of the table the estimated mean gross school effect is presented: 
across all studies the size of the school effect is .1386, and the variance across studies 
indicates that the 95% confidence interval runs from .0162 to .2610. The remainder of the 
table presents the effects of subject, sector, and country on the size of school effects: for 
language achievement in primary education in the Netherlands the size of school effects in 
secondary education is .1594 - .0831 =.0763. 

The results with respect to the net size of school effects are presented in table 4: 

Table 4: Results from the meta-analysis on net school effects 





effect 


S.e. 


p-value 


mean net school effect 


.071 1 


.0062 


.000 


variance across studies (res) 


.0010 




.000 


intercept net school effect 


.0727 


.0111 


.000 


sector-effect (secondary) 


-.0194 


.0153 


.177 


subject-effect (mathematics) 


.0268 


.0123 


.040 


country-effect (uk) 


.0006 


.0148 


.396 


variance across studies (res) 


.0008 




.000 



V(/hen the school effect is estimated after taking intake differences between schools into 
account the size of the school effect (that can now be interpreted as the net school effect) 
diminishes to .0711, meaning that only 7 percent of the variance in achievement can be 
accounted for by the schools that pupils attend. Furthermore the results indicate that the 
size of the school effect for mathematics is .0268 percent higher than for language. No 
sector effects and no country effects are found. 

In appendix 2 the results of meta-analysis per sector are presented. The results from these 
analyses can be summarized as: differences in the size of net school effects can only be 
demonstrated in primary e lucation: the estimated between school variance is higher for 
mathematics and for the UK. 



EDUCATIONAL LEADERSHIP 



introduction 

In the foregoing section we have tried to answer the question what the ’true’ size of school 
effects may be. The second step in this project consists of determining through means of a 
meta-analytical approach whether variables mentioned in the school effectiveness literature 
do have a positive relation with relevant output measures and what the estimated effect 
size of these variables might be. One of the important variables in this respect is 
(educational) leadership and in this section of this paper we will explore the question 
whether this variable has a significant relationship with measures of student achievement 
and what the effect size of this variable might be. Answering these questions is not an 
easy task. One of the reasons for this is the diversity in the way this concept is 
conceptualized and investigated. In this respect one can discern at least three approaches. 
From a meta-analytical point of view the first two approaches are the least problematic. 
These approaches use either a single concept of leadership or an overail (or ’latent’) 
concept of leadership. In a meta-analytical approach then we can use the data mentioned 
in these studies about the nature and size of the relationship between the single or latent 
concept of leadership anc the output measure used in these studies. Far more 
problematic in this respect however, are those studies that use different indicators of 
leadership, but do not discern an overall or ’latent’ concept of leadership. In many cases 
these studies only report the size of relationship between the indicators and the relevant 
output measures, while they do not give any indications about the overall effect of these 
indicators on the output measures used. Things get even more complicated, when these 
studies only supply data about those indicators that do have a significant relationship with 
the output measures used and refuse to give any information about the nature and size of 
effect of indicators that do not have a significant relation with the output measure used. 
Examples in this respect are for instance the studies of Mortimore et al (1987) and the 
lEA-reading study (Postlewaithe & Ross, 1993). 

To deal with these problems we applied two meta-analytical procedures. The first 
procedure consisted of applying the vote-counting procedure. This procedure makes use of 
the number of positive results in relation to the number of not-positive results (Bushman, 
1994). We applied this procedure thereby in two ways. Firstly, we investigated all studies 
invoived in this analysis (see appendix 3) from an ’overall’ -perspective on leadership. For 
studies using a single or latent concept of leadership this impiied we calculated the number 
of times these concepts had a positive and significant relationship with the output 
measures used in relation to the number of times these concepts were used. For studies 
using multiple indicators only, 

we used the following decision rule. Leadership was thought to have a positive relation 
with the outcome measure used when at least half of all indicators had a positive 
relationship with the output measure used. 

This procedures was repeated but then from an ’indicator’- perspective. For instance, we 
calculated the number of times the leadership indicator ’teacher evaluation’ had a positive, 
significant result in reiation to the number of times this indicator was studied. 

The deficiencies of the vote-counting procedure, at least from the specific procedure we 
used in this study, are that it does not take into account sample size and does not provide 
an effect size estimate. To overcome these deficience a multi-level analysis was 
performed. Earlier in this paper we already dwelled upon this kind of analysis, so we will 
restrict ourselves here to the following formula: 




D 



effect size (j) = intercept + gamma (1) country + gamma (2) method + 
gamma (3) math + gamma (4) lang + gamma (5) sector 

where: 

country 0=else, 1=USA 

study design 0= gross, 1 =value addes (correction for prior achievement 
and/or background variables) 
math 0=composite score for math and language, 

1 =math score only 

lang 0=composite score for math and language 

1=language score only 

sector 0= primary education, 1= secondary education 

Important to note in this context are three things. Firstly, in this analysis only studies were 
used which supplied ail relevant data. This remark is in particular relevant for studies using 
multiple indicators only; studies with missing data were not used. The second important 
remark is the fact that this analysis deals with an ’overall’-perspective on leadership. This 
implies that for studies using multiple indicators only we used the mean effect of the 
different indicators on the output measure in the study as indicator for the effect size in the 
particular study. However, when data about the relationships between the indicators 
themselves and the output measures were available, we repeated the analyses used in the 
study at hand, most of the time simple regression analyses, and used the amount of 
variance explained by these indicators to determine the ’overall’ effect size. Finally, the 
effect size is expressed in r„y (correlation between variable x and y), implying in most 
cases a transformation of the original data into this metric. For an overview of the formulas 
involved we refer to Rosenthal (1994). 

results 

The results regarding the vote-counting procedure are shown in table 5 and 6. From Table 
5 it can be derived that is it is not very likely that leadership does have a positive 
relationship with output measures indicating student achievement. Most studies fail to 
come up with positive significant resuds. 



Table 5: results vote counting procedure from an overall perspective on 
leadership : 





math 


language 


composite 

score 


total 


no. of positive results 


3 


2 


2 


7 


no. of possible positive 
results 


15 


38 


9 


62 


p-value 


.98 


1.00 


.97 


1.00 


Signtest (Ho:n=.05: Hg:Tt>.5) 


Hg in ail case rejected 





This conclusion can also be easily reached, when we are looking from the indicator 
perspective. In most cases indicators relating to the concept of educational leadership do 
not have a positive relationship with measures giving insight in pupils’ achievement levels. 
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at least when we look at the indicators most commonly used in leadership studies. 



Table 6; results vote counting procedure from an indicator perspective on leadership 



indicator 


no of positive no of possible positive 




significant 


significant relations 




relations 


(p-value) 


teacher evaluation/supervising 


3 


36 (.00) 


observation/^lass visits 


2 


8 (.03) 


defining mission 


2 


12 (.03) 


stressing academic standards 


3 


24 (.00) 


involvement with instruction 


7 


39 (.00) 


discussing objectives with 
staff 


6 


28 (.00) 


managing curriculum 


5 


17 (.01) 


monitoring students 


4 


13 (.04) 


development teachers/school 


7 


41 (.00) 


staff participation 


1 


5 (.03) 


providing resources 


1 


1 (1.00)' 


visibility 


2 


2 (1.00)' 


keeping teacher morale high 


2 


2 (1.00)' 


rewarding/punishing pupils 


0 


15 (.00) 


pastoral care 


4 


27 (.00) 


community/parents 


3 


28 (.00) 


safe and orderly climate 


2 


2 (1.00)' 


Signtest (Ho:7t=.05; Ha:7:>.5) 
"=signicant at .05 






The results shown in table 5 and 6 can be summarized by saying that it is not very likely 
that leadership is related to pupils’ achievement. This conclusion is more or else confirmed 
by the results from the multi-level analysis. Table 7 shows the estimated effect size of the 
variabele leadership and the variance across studies. 


Table 7: Estimated effect size and variance across studies 




effect 


s.e. p-value 


mean effect-size 


.0414 


0.225 .075 


variance across studies 


.0106 


.000 



The estimated mean effect size across all studies, which is, as mentioned before 
expressed in r, is .0414, which is significant at the 10%-level (one-tailed). 

The estimated variance across all studies is .0106, which indicates that the 95%-prediction 
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interval around the mean effect-size runs from r=-.1604 to r= .3443. 

The results regarding analyses trying to predict differences between effect sizes with study 
characteristics (or moderators) as subject matter, sector, country, and method of analysis 
show that two predictors have a significant relationship with the effect size (see table 8). 



Table 8: predicting differences 





effect 


s.e. 


p-value 


intercept 


-.1159 


.0543 


.043 


country (USA) 


.1388 


.0452 


.005 


study-design (value added) 


.1080 


.0488 


.037 


variance across studies (res) 


.0065 




.006 



This model shows that on average the effect of leadership on (uncorrected) student 
outcomes is -.1159. In studies where the effect of leadership is assessed after taking 
previous student achievement and/or background characteristics of students into account 
the effect is -.0C79 (-.1159 + .1080). In words, after correcting for student background 
and/cr previous achievement there is no effect at all. Study design thus influences the 
estimated effect size. 

Looking at country differences, however, it turns out that US-studies show significantly 
higher effect sizes. If we take the value added studies, then the estimated effect size for 
the US is .1309 (-.0079 + .1388). In words, in the United States leadership seems to 
matter when one wants to differentiate between ’good’ schools and ’bad’ schools. 

Finally, one can deduce from table 8 that after having taken country and study-design 
differences into account, there are no longer residual differences between the studies in 
the estimated effect-sizes for leadership effects on student achievement. This implies that 
the moderators subject and sector hardly explain any of the variation between studies. 



DISCUSSION 

We have argued that the results of school effectiveness research can not be generalized 
other then with great caution. A qualitative approach led to the conclusion that the 
empirical basis for the effectiveness enhancing school factors is poor. The reasons why 
new analyses of existing datasets and /or quantitative meta-analyses should be undertaken 
are clear. To illustrate the meta-analytical approach used in our project both the size of 
school effects was estimated and the effect size of school leadership was estimated in a 
quantative way. 

Regarding the school effect size we found that gross school effects had an estimated 
magnitude of 14 and net school effects an estimated magnitude of 7 percent. It turned out 
that sector, country and subject domain affected the estimated size of school effects. 

What then do we know about the importance of schools? First of all we have to be aware 
of three pertaining problems; 

1) the effect size is underestimated, since measurement error in the achievement tests 
shows up as. within school variance; if we would take count of the, say, 20 percent "noice", 
the ration of the between school variance to the total "true" variance would improve from 
14 to 18 percent. 
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2) the effect size is overestimated, since the important intermediate level of the classroom 
is ignored. Including the intermediate level would lead to an estimated decline in the "true" 
size of the gross school effect to approximately half of it. Misspecification in this respect 
leads to a statistical artificial increase in the size of school effects (e.g. Rowe and Hill, 
1994). 

3) we have assumed stable school effects, but other research (e.g. Bosker, 1992; Luyten, 
1994) has shown that there is considerable variation between subjects, between cohorts, 
between grades, between classes, and between groups of pupils with different 
backgrounds. A further reduction of the true effect size thus seems plausible. 

All in all the true gross effect may be something like 10 percent, and the true net effect 
something like 6 percent. Is this "much a do about nothing?". A technical answer can be 
given following the conceptual idea that one school affects all of its pupils. The importance 
of the school effect then can be assessed by looking at the school total of deviations, to 
which the within school variance and the between school variance relatively contribute 1: 
nx^/o^ (Longford, 1994, 27-28). The trie lies in the premultiplication with n. This may be the 
number of pupils per school in the sample, or a value deemed important a priori (e.g. total 
number of pupils in a cohort, or even better: total number of pupils leaving the school over 
a number of years). The net between school variance is then as important as the within 
school variance if we consider a small class of 20 pupils per school. But if we consider 
consistent stable performing secondary schools, that serve 1,000 pupils over a period of 5 
years each, the relative importance of the school is 50 times as high the within school 
variation. If thus seems a matter of taste to judge something as important of not. Our 
contention would be: much a do about something, and quite rightly so! 

More problematic in this respect are of course the results regarding leadership. Since the 
overall contribution of educational leadership to student achievement is about zero, we 
might argue that all the fuzz about educational leadership must be based on ideological 
rather than on empirical grounds. However, since our results also show that there are large 
differences between educational contexts (US-studies showing a positive relationship of r= 
.1309 between educational leadership and achievement, w^ile this result can not be found 
in other countries), our conclusion might be that we shodld not forget about this variable 
when thinking about effective schools. In this respect the most valid conclusion then is that 
educational leadership does matter in certain educational cor.texts, but that this effect is 
not generalizable to other educational contexts. This in turn leads of course to the more 
fundamental question why leadership is an important variable in the United States and not 
in other countries of the world. 

The last remarks made in this paper will deal with the status of our conclusions. Important 
to note in this respect is that our conclusions are based on preliminary findings. For 
instance, our results about the size of school effects deal only with studies conducted in 
two countries. Furthermore, there are still some technical problems to be solved regarding 
our meta-analytical approach. Here one can think for instance of questions like whether or 
not it is possible to adjust our results for unreliability of the (in)dependent variables used in 
the studies at hand and whether or not we should apply procedures that adjust our meta- 
analytical results for bias due to publication bias. 
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Appendix 2: Meta-analyses per sector 

Table 5: Results from the meta-analysis on gross primary school effects 



effect 
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p-value 
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Table 6: Results from the meta-analysis on net primary school effects 
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Table 7: Results from the meta-analysis on gross secondary school effects 
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Table 8: Results from the meta-analysis on net secondary school effects 
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Appendix 4: descriptive statistics of studies used in multi-level analyses to 

determine the effect size of leadership 



STUDY DESIGN 
Value Label 
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